Using Generalization Error Bounds to Train the Set Covering Machine

نویسندگان

  • Zakria Hussain
  • John Shawe-Taylor
چکیده

In this paper we eliminate the need for parameter estimation associated with the set covering machine (SCM) by directly minimizing generalization error bounds. Firstly, we consider a sub-optimal greedy heuristic algorithm termed the bound set covering machine (BSCM). Next, we propose the branch and bound set covering machine (BBSCM) and prove that it finds a classifier producing the smallest generalization error bound. We further justify empirically the BBSCM algorithm with a heuristic relaxation, called BBSCM(τ), which guarantees a solution whose bound is within a factor τ of the optimal. Experiments comparing against the support vector machine (SVM) and SCM algorithms demonstrate that the approaches proposed can lead to some or all of the following: 1) faster running times, 2) sparser classifiers and 3) competitive generalization error, all while avoiding the need for parameter estimation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning with Decision Lists of Data-Dependent Features

We present a learning algorithm for decision lists which allows features that are constructed from the data and allows a trade-off between accuracy and complexity. We provide bounds on the generalization error of this learning algorithm in terms of the number of errors and the size of the classifier it finds on the training data. We also compare its performance on some natural data sets with th...

متن کامل

The Set Covering Machine with Data-Dependent Half-Spaces

We examine the set covering machine when it uses data-dependent half-spaces for its set of features and bound its generalization error in terms of the number of training errors and the number of half-spaces it achieves on the training data. We show that it provides a favorable alternative to data-dependent balls on some natural data sets. Compared to the support vector machine, the set covering...

متن کامل

Bounds in Terms of Rademacher Averages

So far we have seen how to obtain generalization error bounds for learning algorithms that pick a function from a function class of limited capacity or complexity, where the complexity of the class is measured using the growth function or VC-dimension in the binary case, and using covering numbers or the fatshattering dimension in the real-valued case. These complexity measures however do not t...

متن کامل

Algorithmic Stability 3 4 Regularization Algorithms in an RKHS

In the last few lectures we have seen a number of different generalization error bounds for learning algorithms, using notions such as the growth function and VC dimension; covering numbers, pseudo-dimension, and fatshattering dimension; margins; and Rademacher averages. While these bounds are different in nature and apply in different contexts, a unifying factor that they all share is that tha...

متن کامل

Learning with the Set Covering Machine

We generalize the classical algorithms of Valiant and Haussler for learning conjunctions and disjunctions of Boolean attributes to the problem of learning these functions over arbitrary sets of features; including features that are constructed from the data. The result is a general-purpose learning machine, suitable for practical learning tasks, that we call the Set Covering Machine. We present...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007